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Formal Concept Analysis for Knowledge 
Discovery from Biological Data 


Khalid Raza 


Abstract Due to rapid advancement in high-throughput techniques, such as microar¬ 
rays and next generation sequencing technologies, biological data are increasing 
exponentially. The current challenge in computational biology and bioinformatics 
research is how to analyze these huge raw biological data to extract biologically 
meaningful knowledge. This chapter presents the applications of formal concept 
analysis for the analysis and knowledge discovery from biological data, including 
gene expression discretization, gene co-expression mining, gene expression clus¬ 
tering, finding genes in gene regulatory networks, enzyme/protein classifications, 
binding site classifications, and so on. It also presents a list of FCA-based software 
tools applied in biological domain and covers the challenges faced so far. 


1 Introduction 

After Human Genome Project, there is unprecedented growth in biological data. Due 
to technological advancement in high throughput technologies, such as Microarray 
and Next Generation Sequencing, it is possible to produce high quality biological 
data with rapid speed. The biological data can be broadly classified as Genomics, 
Transcriptomic and Proteomics. For example, gene expressions are transcriptomic 
data that quantify the state of genes in a cell. When these gene expression data 
are analyzed properly, it may reveal many hidden cellular processes and biological 
knowledge. These knowledge discoveries from biological data may lead to better 
understanding of disease mechanism and further it guide for better diagnosis and 
therapy of the disease. 

Formal Concept Analysis (FCA), introduced by R. Wille in early 1980s m, is a 
method based on lattice theory for the analysis of binary relational data. Since its 
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inception, FCA has been found to have potential applications in many areas includ¬ 
ing data mining, knowledge discovery and machine learning. Like other computa¬ 
tional technique, FCA has also been applied in microarray analysis, gene expression 
mining, gene expression clustering, finding genes in gene regulatory networks, en¬ 
zyme/protein classifications, binding site classifications, and so on. In this chapter, 
we will present the current status of FCA for the analysis and knowledge discovery 
from biological data and also cover challenges faced so far. 


2 Biological Databases 

Due to availability of high-throughput techniques, biological database are being 
generated exponentially and the modern biology has turned into a data-rich science. 
Some of the important biological data are nucleotide and protein sequences, protein 
3D structure produced by X-ray crystallography and NMR, metabolic pathways, 
complete genomes and maps, gene expression and protein-protein interaction, and 
so on. 

Biological databases are broadly divided into sequences databases and structure 
databases. Sequence data are applicable for both DNA and protein, but structural 
databases are applicable for proteins only. Today, most of the biological databases 
are freely available to the researchers. In general, biological databases can be clas¬ 
sified as primary, secondary and composite databases. A primary databases stores 
information of either sequence or structure. For example, Uni-PROT and PIR for 
protein sequence, GenBank and DDBJ for Genome sequence and the Protein Data¬ 
bank for protein structure. Secondary database stores information which are derived 
from the primary database source, such as conserved sequence information, active 
site residues of the protein families arrived by multiple sequence alignment of a 
set of related proteins, etc. The SCOP, CATCH, PROSITE are few examples of 
secondary databases. A composite database is a collection of variety of different 
primary database sources that avoid the need for searching into multiple database 
sources. The National Centre for Biotechnology Information (NCBI) is the main 
central host that links multiple database sources and makes these resources freely 
available to us. For more information about biological databases, refer tutorials in 
11 and Q. 


3 Microarray Analysis 

3.1 Mining gene expression data 

Due to rapid advancement in high throughput technology such as Microarray and 
Next Generation Sequencing, transcriptomic data has been produced in unprece- 
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dented way. But analysis and interpretation of these data remains a challenge for 
the researchers due to complexity of the biological systems. The motivation and 
biological background that need to be considered for gene expression mining are: 
i) mostly single gene participates in many biological processes, i.e., it has several 
functions, ii) a biological process implies a small subset of genes, iii) a biological 
process of interest may be active in many, all or none situation for a given dataset, 
and iv) differentially expressed genes over different samples are not frequent. 

In transcriptomic, researchers routinely analyze expression level of genes in dif¬ 
ferent situations such as in tumor samples versus normal samples. Formal Concept 
Analysis (FCA) has been successfully applied in the field of transciptomics. Some 
of the studies identified set of genes that are sharing same transcriptional behav¬ 
ior using FCA Due to availability of large gene expression datasets, it is 

possible to apply data mining tools to identify patterns of interest in the gene expres¬ 
sion data. One of the most widely used data mining technique is association rules 
which can be applied for the analysis of gene expression data. Association rules 
may uncover biologically relevant associations between genes, or between different 
environmental conditions and gene expression. An association rules can be written 
in the form Si S 2 , where Si and S 2 are disjoint sets of data items. The set S 2 is 
likely to occur whenever the set Si occurs. Here, the data items may include highly 
expressed or repressed genes, or other relevant facts stating the cellular environment 
of genes such as diagnosis of a disease samples. Association rules mining has been 
applied for gene expression data mining by many researchers including lH. 

In this section, we have discussed only the application of different variants of 
FCA for gene expression data mining, especially extracting co-expressed groups of 
genes sharing similar expression. Most of the methods for co-expressed genes min¬ 
ing are based on binary biclustering methods. Here, scaling of data is done using a 
single threshold and one expression value. The expression values above this thresh¬ 
old are considered as over-expressed and represented by 1; otherwise it is consid¬ 
ered as under-expressed and represented by 0. Once, the gene expression values are 
discretized to binary table then strong relationships can be extracted having biolog¬ 
ically meaningful information. Kaytoue-Uberall et al. (2008) M proposed interval- 
based FCA to extract groups of co-expressed genes. Given a set of genes G, a set of 
relationships S, and set of ordered intervals T, {g, {s, t)) Gl, where g G G,s G S,t G T 
and I is binary relation means gene expression value of gene g is interval of index 
f for situation S. Hence, formal concept of the context {G,S x T,I) shows groups 
of genes having G S V in same interval. Although a priori determination of these 
intervals are difficult. 

Messai et al. (2008) M proposed interval-free FCA based method to cluster 
gene expression values. However, this algorithm does not deal with large data set 
and also no link to interordinal scaling was done. To overcome these problems, 
Kaytoue-Uberall et al. (2009) lb) introduced two FCA-based methods for clustering 
gene expression data. The first method is based on interordinal scaling and second 
one is based on pattern structures that require adaptation of algorithm computed with 
interval algebra. Between these two algorithms by Kaytoue-Uberall et al. (2009) 0, 
second method has been proved to be more computationally efficient and provide 
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more readable results. These algorithms have been tested on microarray gene ex¬ 
pression data of fungus Laccaria biocolor taken from Gene Expression Omnibus 
databases (GSE9784) composed of 22,294 genes and five different conditions. Eor 
the dimension reduction, cyber-T Q tool was used that filter dataset and returned 
10,225 genes. 

DNA methylate affects the expression of genes and their regulation may cause 
several cancer-specific diseases. It is observed in many investigations that hy- 
pomethylation of DNA have been associated with many cancers including breast 
cancer. Amin et al., (2012) m applied EGA for mining the hypomethylated genes 
among breast cancer tumors. They constructed formal concepts lattices with sig¬ 
nificant hypomethylated genes for every breast cancer subtypes. The constructed 
lattice reflects the biological relationships among breast cancer tumor subtypes. The 
proposed Alter method has two stages: non-specific Alter and specific Alters. The 
non-speciflc Altering step determines the hypomethlated CPGs by computing the 
difference between the mean of methylation level for the corresponding adjacent 
normal tissue. The second stage (speciflc Altering) receives the output of the flrst 
stage as input and performs one-sample Kolmogorov Smirnov test to check the nor¬ 
mality of each breast cancer subtypes. If the given dataset follows normal distri¬ 
bution then paired t-test is applied, otherwise Wilcoxon signed ranked is applied. 
Once, the Altering of hypomethylated genes is done then EGA has been applied 
to determine breast cancer subtypes. Here, Java-based EGA analysis software tool, 
called GonExp im was used to generate the lattice diagram. 


3.2 Clustering gene expression data 

Eor grouping set of genes and/or grouping experimental conditions having simi¬ 
lar gene expression pattern, clustering algorithms are the most popularly applied 
method. Some of the most widely used clustering algorithms are hierarchical, k- 
means, self organizing maps, fuzzy c-means, and so on Ha. However, EGA has 
also been used for grouping genes, as an alternative approach to clustering. Ghoi 
et al. (2008) ifTjll proposed EGA-based approach for grouping genes based on their 
gene expression pattern. EGA builds a lattice from the gene expression data to¬ 
gether with some additional biological information, where each vertex corresponds 
to a subset of genes which are clubbed together based on their expression values and 
some other functional information. The lattice structures of gene sets are assumed 
to show biological relationship in the gene expression dataset. Here, similarities and 
dissimilarities between different experiments are determined by corresponding lat¬ 
tices. This approach consists of three main steps: i) building a binary relation, ii) 
construction of concept lattice, and iii) deflning a distance measure and comparing 
the lattices. In the flrst step, the objects are genes, their discretized gene expression 
attributes and biological attributes. In the second step, for each experiment a binary 
relationship is constructed using concept lattice algorithm. Einally, third step calcu¬ 
lates distance and compares the lattices. The work of Ghoi et al. (2008) US is an 
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attempt to apply FCA for gene clustering but the distance measure employed was 
quite fundamental and it did not properly exploit the properties of the lattice struc¬ 
ture. Hence, other possible distance measures such as spectral distance, maximal 
common sublattice based distance, etc. can also be investigated 113. In addition to 
global lattice comparison, local structure (sublattice) can also be investigated that 
may assist in identification of particular biological pathways. 

Melo and collaborators ns proposed an FCA-based approach combined with 
association rule and visual analytics to hnd out overlapping groups of genes in gene 
expression and analyzed it in an analytical tool called CUBIST. The workflow of 
CUBIST involves querying a semantic databases and transforming the result into 
formal context and then it is visualized as a concept lattice and associated charts. 
The CUBIST tool address the challenges of gene expression analysis by hltering 
and grouping large amount of datasets, interactive exploration of data and presents 
various relevant statistics. 


3.3 Clustering multi-experiment expression data 

Due to availability of high-throughput techniques, presently we have large num¬ 
ber of gene expression datasets. Combining datasets taken from multiple microar¬ 
ray experiments is research question. It has been proved and suggested by many 
recent studies that the analysis and integration of multi-experiment datasets are ex¬ 
pected to give more accurate, reliable and robust results. The reason is that inte¬ 
grated datasets would be based on large number of gene expression samples and 
the effects of individual study-specihc biases are reduced. For the consensus in¬ 
tegration of multi-experiment expression data, FCA has been successfully applied 
by Hristoskova and collaborators ifTbll . They proposed a generic consensus cluster¬ 
ing which applied FCA for consolidation and analysis of clustering solutions taken 
from multiple microarray experiments. Initially, the datasets are broken into mul¬ 
tiple groups of related experiments based on some predefined criteria. In the next 
step, a consensus clustering technique is deployed to each group that results on 
clustering solution per group. Further, these solutions are pooled together and ana¬ 
lyzed by FCA that enables extracting valuable insights from the data and generate 
a gene partition over all the experiments. The FCA-enhanced consensus clustering 
algorithm proposed by Hristoskova and collaborators ifTbl is depicted in Fig. 1. The 
algorithm is divided into three steps; initialization, clustering and FCA-based anal¬ 
ysis. In the initialization step, multi-experiment data are divided into r groups of 
related datasets. Clustering step applies consensus clustering that generates r differ¬ 
ent solutions. FCA-based analysis step construct concept lattice that partitions the 
genes into a set of disjoint clusters, as shown in Fig. 1. The advantages of FCA- 
enhanced clustering approach proposed in ifT^ are as follows: i) Uses all data that 
allow each group of related experiments to have a different set of genes, i.e., to¬ 
tal set of studies genes is not limited to those present in all the datasets, ii) it can 
be better tuned for each samples by identifying initial number of clusters for each 
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Microarray data 1. Initialization Step 

realized under Divided into 

different experimental r groups 

conditions of related datasets 


2. Clustering Step 
Consensus clustering 
generates r different 
clustering solutions 


3. FCA-based Analysis Step 
Constructed concept lattice partitioning 
the genes into a set of disjoint clusters. 



Fig. 1 Schematic representation of the FCA-enhanced consensus clustering algorithm (m 


group of related experiment, depending upon the number, composition and quality 
of expression profiles, and iii) the problem with ties is avoided by applying FCA 
to analyze together all partitioned results and find out the final clustering solution 
representation as the entire experiment collection. 

One another attempt for the application of FCA for knowledge discovery and 
knowledge integration from gene expression data has been done by Benabderrah- 
mane (2014) OtI. Benabderrahmane ini introduced a symbolic data mining ap¬ 
proach based on FCA involving bi-clustering of genes, for knowledge discovery 
and knowledge integration. Firstly, datasets are represented as a formal context (ob¬ 
jects attributes), where objects are genes and attributes are their expression profiles 
plus additional information was used such as GO terms that they annotate, the list of 
pathways they are involved and their genetic interactions. The algorithm has eight 
steps, the outline of the algorithm is depicted in Fig. 2. This algorithm integrates dif¬ 
ferent kinds of datasets such as genes having similar expression profiles and share 
similar biological function (GO ontology), knowledge-base of pathways and inter¬ 
actors (KEGG, BioGrid, STRING, etc.) 


3.4 Gene expression data comparison 


Finding and understanding the similarities among various diseases is an import re¬ 
search problem in translational bioinformatics. Understanding disease similarities 
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Fig. 2 An overview of the proposed framework proposed by Benabderrahmane (20141 1171 


may help us in refining disease classification, identifying common etiology of co¬ 
morbidities in genetic studies and finding analogies between closely related diseases 
and finally identify common treatments Gl. Bhavnani and collaborators im ap¬ 
plied network analysis approach to find similarities among renal disease using gene 
expression data. 

In addition to many computational techniques, FCA has also been applied for 
finding disease similarities. The work of Keller et ah, (2012) m shows the applica¬ 
tion of FCA for identification of disease similarity. They identified formal concepts 
using gene disease associations that indicate hidden relationship among diseases 
having same set of associated genes, and gene that are associated with same set of 
disease. The FCA approach has advantages over network analysis approach, such as 
i) FCA allows representation of relationships among several diseases, ii) it provide 
results in algebraic form allowing to consider relationship among concepts, and iii) 
additional gene annotation can be added to refine concepts that assist for the identifi¬ 
cation of functional gene relationships within disease groups. FCA has been applied 
on renal disease dataset that finds unexpected relationships among disease which 
are promising but it suffers from few disadvantages. The difficulty with FCA is that 
many of the formal concepts may not be useful because only a few formal concepts 
indicate relationships. 
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3.5 Identifying genes of gene regulatory networks 

Gene regulatory networks (GRNs) are the systematic biological networks describing 
interaction among set of genes in the form of a graph, where node represents genes 
and edges dehnes their regulatory interactions. Understanding the GRNs helps in 
understanding interactions among genes, biological and environmental effects and 
to identify the target genes for drug against the diseases. GRNs have been proved to 
be a very useful tool used to describe and explain complex dependencies between 
key developmental transcription factors (TFs), their target genes and regulators 
ED . For the better understanding of a gene regulatory network (GRN), it is necessary 
to know set of genes belonging to it. Identihcation of these set of genes correctly 
is a challenging task, even for a small subnetworks. In fact, only few genes of a 
GRN are known and rest of the genes are guessed based on experience or informed 
speculation ll22ll . Hence, it is better to rely on experimental data to support these 
guesses. 

Gebert and collaborators fTIX presented a new FCA based method to detect 
unknown members of GRN using time-series gene expression data. Suppose that 
G = {gi,g 2 ,,gn} is the set of all genes in an organism and 5 C G is set of seed 
genes. The goal is to hnd subset S' C G\S of genes which interact strongly with 
GRN dehned by set S. Let R C G X G be a relation having interactions and M an 
n X I matrix that consists of time-series gene expression prohles having length 1. 
If pair {gi,gj) S R then it is known that g, and gj interaction to each other. The 
FCA-based approach proposed by Gebert et ah, (2008) Il22l has three main steps 
described as follows. First step is preprocessing step that uses the relation R to get 
an initial list of interesting genes. If interaction data are not available, this step is 
skipped and entire gene set G is taken as the initial list of genes. In the second step, 
concept lattice is constructed using gene expression data that reduces the number 
of genes on the initial list. The last step computes probabilities for the correlation 
coefficient between genes that result from the second step and genes of S in order to 
get list of signihcant interactions. 


4 Classification and prediction of enzymes, ligand and 
domain-domain identification 

The classihcation and study of relations in FCA is focused on the basis of the ob¬ 
jects and various types of related attributes (binary, nominal, ordinal etc.), therefore 
it is quiet helpful for computational scientist working on Biological data, who may 
wish to skip the inside details. With several advantages, including strong mathemat¬ 
ical basis, FCA serves in several applications to explore biological data, enzyme 
classihcations, identihcation of important protein domains (including protein bind¬ 
ing sites) and related drug molecules. FCA is also reported useful in the integration 
of Biological activity with chemical spaces. This list is not exhaustive; FCA has 
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also been used to understand the structural classification of glasses 12^ and several 
other studies. In this section, we discussed some important application of FCA for 
the classification of enyzmes, binding site identification and discovery of ligand as 
drug molecules and so on. 


4.1 Enzyme classification 

Enzymes are proteins which catalyses biological reactions and they are named and 
classified according to the reaction they catalyse. For example, hydrolases are those 
types of enzymes which are involved in the reactions by addition or deletion of water 
molecules. Though the sequences of most of the enzymes are available in numerous 
biological databases, it is tedious task to predict the function of the enzymes from 
their respective sequences due to varied activity from small sequence combinations. 
Considering that the new enzyme family may emerge, an effort was made for en¬ 
zyme classifications using FCA which classifies the enzymes that does not belong 
to known family ll24ll . They comment; it is easier to predict the super-families of 
the proteins as compared to the families of the proteins. In this study, the labelled 
and unlabelled enzyme sequences were objects whereas attributes represent the en¬ 
zyme blocks. Enzyme blocks are formed by sequential arrangements of the amino 
acids, which correspond to specific functions like catalytic site, lining residues of 
important pockets or binding sites. In this method of classification, more than half 
unlabelled sequences were found to be correctly classified. 

Another attempt for the classification of protein using FCA has been done by 
Han and collaborators 1251 . They proposed FCA-based approach for protein classi¬ 
fication that uses protein domain and Gene-Ontology annotation information. Pro¬ 
tein domains represent the evolutionary information forming a protein, while Gene- 
Ontology describes other properties of proteins that includes structure of protein, 
molecular interactions, etc. Han and collaborators ll25l applied tripartite lattice for 
interpenetrations among protein, domain and GO terms. With the help of tripartite 
lattice, they classified protein from domain composite and their corresponding GO 
term description. They extracted concrete information using tripartite lattice in the 
corresponding domain that co-occur in proteins because they are more likely to ex¬ 
hibit common functions, as annotated in GO terms. 


4.2 Binding site identification 

Protein binding sites (PBS) and ligand binding sites identification are vital to 
protein- protein and protein-ligand interactions, respectively. This eventually helps 
the medical science in identification of better drug or therapy for several important 
diseases. There are several ways to identify the binding sites. Most commonly, the 
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protein docking protocol helps in identifying the binding site by forming complex 
with one protein to other protein or a ligand (which is a drug in most of the cases). 

Bresso et ah, (2012) ll26ll in their report highlight: Majority of the reported meth¬ 
ods utilising the structure based prediction methods for protein-protein interactions 
consider the attributes, which are physico-chemical properties like hydrophobicity, 
residue constituents but lack the representation of properties (e.g. accessible sur¬ 
face of a particular residue) of binding components or spatial relation between two 
components (residues). Considering these limitations and knowing the flexibility of 
FCA, Bresso et al., (2012) ll26l utilised available protein 3D structures for charac¬ 
terizing PBS. In this concept, Inductive Logic Programming (ILP) was linked with 
FCA, which enabled identifications and discovery of distinct binding pockets of 
protein-protein interactions. 


4.3 Discovering Ligand from database as a drug molecule 

Using FCA, several attempts have been made to identify suitable ligands from num¬ 
ber of chemical database like lUPFIAR, ZINC and many more. The reports suggest 
that FCA helps in the identification of drug molecules. Drug molecules can be ei¬ 
ther agonist (activators) or antagonist (inhibitors). For a given protein, these drug 
molecules, would likely act as agonist or antagonist. The one which do not binds 
and do not show the changes, are not considered as drug molecules. In addition 
to the ADMETox properties, the chemical molecules, which follow the Lipinski’s 
Rule, are considered as suitable drugs. 

Actually, when we talk about drugs, a chemical compound has number of phys¬ 
ical properties: Hydrogen bond donors, acceptors; rotatable bonds; topological sur¬ 
face area; molecular weight; XlogP and chemical properties: absorption; digestion; 
metabolism; excretion; toxicity. Using these properties as attributes for the object 
ligand which could be possibly a drug molecule, one can identify and differenti¬ 
ate them from a bulk of chemical molecules in the database using FCA. To take 
an example, similar attempt was made by Sugiyama et al., in 2012 lIZTl . They 
considered the physical features discussed above, including the number of Lipin- 
skis rule broken to set as attribute in order to identify the ligands from lUPHAR 
database. They designed an algorithm, LIFT (Ligand Finding via Formal ConcepT 
Analysis) for semi-supervised multi-labelled classification from mixed type data. 
Results of the algorithm were effective and proved to be efficient system of clas¬ 
sification to identify the ligands from the training data. Fragment Formal Concept 
Analysis (FragFCA) introduced by Lounkine et al., (2008) 12^ has the ability to 
identify the selective hits in high-throughput screening data sets. In the concept de¬ 
sign of FragFCA, combinations of molecular fragments are the ’objects’ and their 
’attributes’ includes the compound activity and potency information. 

The effectiveness of better drug identification can be improved, when the at¬ 
tributes classifying the ligands could be slightly updated, so as to filter non-peptide 
molecules from the bulk of drug molecules. It has been identified that the peptide 
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molecules have limited in vivo efficacy due to pharmacological constraints: solu¬ 
bility, stability and selectivity. Hence, for reliable and safer drug therapy, discovery 
and optimisation of non-peptide inhibitors/drugs is necessary Moreover, in a 
recent in silico identibcations of the drug molecules for Cathepsin L (SmCLl) of the 
organism. Schistosoma mansoni responsible for the disease schistosomiasis, it was 
revealed that the non-peptide molecules could be better drug molecules as compared 
to peptide drugs molecules ll30l . The list of popularly used software tool based on 
FCA is shown in Table 1. 

So, to conclude, FCA can set an excellent framework to deal with variety of 
problems. Before application of the concept on to the biological data minor opti¬ 
misations and through understanding of the domain is the need in current study for 
better research. 


Table 1 List of FCA based software tools applied in biological domain 


S.NoTool Name Descriptions 


References 


1 . 

2 . 

3. 

4. 


5. 

6 . 


7. 

8 . 


9. 


10 . 


ConExp 

FcaStone 

Contextual 

Editor 

FcaBedrock 


Lattice Miner 


CUBIST 


Galicia 

OpenFCA 


LIFT 


FragFCA 


Java-based FCA analysis software tool Serhiy (2000) CD 
used to generate the lattice diagram. 

Tool for format conversion and Priss (2008) OD 
command-line lattice generation. 

Role FCA tool that work with Eclipse model- Mhle & Wende 
ing tool. (2010) (32) 

A tool for creating context files for For- Andrews & Or- 
mal Concept Analysis. It can convert phanides (2010) 
existing data sets in flat-file CSV or 1351 
3-column CSV, to Burmeister (.ext) or 
FIMI (.dat) context files. 

FCA tool for the construction, manipu- Lahcen & Kwuida 
lation and visualization of concept lat- (2010)) (Ml 
tices. 

Gene expression analysis tool that com-Melo et al., (2013) 
bines FCA with association rule and vi- (m 
sual analytics. It provides filtering and 
grouping large data sets, its interactive 
exploration and provides various rele¬ 
vant statistics. 

Galois lattice integrative constructor Galicia (33) 

This project comprises of a set of tools Borza et al., (2010) 
for performing FCA activities, including (Ml 
creation of context, visualization of lat¬ 
tice and attribute exploration 

Ligand Finding via Formal ConcepT Sugiyama et al.. 
Analysis (LIFT) for semi-supervised (2012) (27) 
multi-labelled classification from mixed 
type data. 

Fragment Formal Concept Analysis Lounkine et al., 
(FragFCA) identifies the selective hits in (2008) (2^ 
high-throughput screening data sets. 
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5 Conclusions and Discussions 

Biological data are growing with unprecedented rate. High throughput technologies 
fuelled in the production of high quality biological data. These data when analyzed 
properly then one can discover several fruitful knowledge hidden inside biological 
data. Formal Concept Analysis (FCA) is a method based on lattice theory for the 
analysis of binary relational data and has been found to have potential applications 
in many areas of bioinformatics and computational biology, beside other applica¬ 
tions. In this chapter, we presented the current status of FCA for the analysis and 
knowledge discovery from biological data including gene expression discretization, 
gene co-expression mining, gene clustering, finding genes in gene regulatory net¬ 
works, enzyme/protein classifications, binding site classifications, and so on. It also 
presented a brief list of FCA-based software tools applied in biological domain and 
covered some challenges faced so far. 
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